Search CORE

arXiv.org e-Print Archive

An Empirical Bayes Approach for Multiple Tissue eQTL Analysis

Author: Li Gen
Nobel Andrew B.
Rusyn Ivan
Shabalin Andrey A.
Wright Fred A.
Publication venue
Publication date: 06/09/2017
Field of study

Expression quantitative trait loci (eQTL) analyses, which identify genetic markers associated with the expression of a gene, are an important tool in the understanding of diseases in human and other populations. While most eQTL studies to date consider the connection between genetic variation and expression in a single tissue, complex, multi-tissue data sets are now being generated by the GTEx initiative. These data sets have the potential to improve the findings of single tissue analyses by borrowing strength across tissues, and the potential to elucidate the genotypic basis of differences between tissues. In this paper we introduce and study a multivariate hierarchical Bayesian model (MT-eQTL) for multi-tissue eQTL analysis. MT-eQTL directly models the vector of correlations between expression and genotype across tissues. It explicitly captures patterns of variation in the presence or absence of eQTLs, as well as the heterogeneity of effect sizes across tissues. Moreover, the model is applicable to complex designs in which the set of donors can (i) vary from tissue to tissue, and (ii) exhibit incomplete overlap between tissues. The MT-eQTL model is marginally consistent, in the sense that the model for a subset of tissues can be obtained from the full model via marginalization. Fitting of the MT-eQTL model is carried out via empirical Bayes, using an approximate EM algorithm. Inferences concerning eQTL detection and the configuration of eQTLs across tissues are derived from adaptive thresholding of local false discovery rates, and maximum a-posteriori estimation, respectively. We investigate the MT-eQTL model through a simulation study, and rigorously establish the FDR control of the local FDR testing procedure under mild assumptions appropriate for dependent data.Comment: accepted by Biostatistic

Reconstruction of a low-rank matrix in the presence of Gaussian noise

Author: Nobel Andrew B.
Shabalin Andrey A.
Publication venue
Publication date: 01/01/2013
Field of study

This paper addresses the problem of reconstructing a low-rank signal matrix observed with additive Gaussian noise. We first establish that, under mild assumptions, one can restrict attention to orthogonally equivariant reconstruction methods, which act only on the singular values of the observed matrix and do not affect its singular vectors. Using recent results in random matrix theory, we then propose a new reconstruction method that aims to reverse the effect of the noise on the singular value decomposition of the signal matrix. In conjunction with the proposed reconstruction method we also introduce a Kolmogorov–Smirnov based estimator of the noise variance

Computational tools for discovery and interpretation of expression quantitative trait loci

Author: Rusyn Ivan
Shabalin Andrey A
Wright Fred A
Publication venue
Publication date: 01/01/2012
Field of study

Expression quantitative trait locus (eQTL) analysis is rapidly moving from a cutting-edge concept in genomics to a mature area of investigation, with important connections to genome-wide association studies for human disease, pharmacogenomics and toxicogenomics. Despite the importance of the topic, many investigators must develop their own code or use tools not specifically suited for eQTL analysis. Convenient computational tools are becoming available, but they are not widely publicized, and investigators who are interested in discovery or eQTL, or in using them to interpret genome-wide association study results may have difficulty navigating the available resources. The purpose of this review is to help investigators find appropriate programs for eQTL analysis and interpretation

arXiv.org e-Print Archive

Finding large average submatrices in high dimensional data

Author: Nobel Andrew B.
Perou Charles M.
Shabalin Andrey A.
Weigman Victor J.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2009
Field of study

The search for sample-variable associations is an important problem in the exploratory analysis of high dimensional data. Biclustering methods search for sample-variable associations in the form of distinguished submatrices of the data matrix. (The rows and columns of a submatrix need not be contiguous.) In this paper we propose and evaluate a statistically motivated biclustering procedure (LAS) that finds large average submatrices within a given real-valued data matrix. The procedure operates in an iterative-residual fashion, and is driven by a Bonferroni-based significance score that effectively trades off between submatrix size and average value. We examine the performance and potential utility of LAS, and compare it with a number of existing methods, through an extensive three-part validation study using two gene expression datasets. The validation study examines quantitative properties of biclusters, biological and clinical assessments using auxiliary information, and classification of disease subtypes using bicluster membership. In addition, we carry out a simulation study to assess the effectiveness and noise sensitivity of the LAS search procedure. These results suggest that LAS is an effective exploratory tool for the discovery of biologically relevant structures in high dimensional data. Software is available at https://genome.unc.edu/las/.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS239 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

Crossref

Helsingin yliopiston digitaalinen arkisto

Genome-wide association study meta-analysis of suicide death and suicidal behavior

Author: Coon Hilary
DiBlasi Emily
FinnGen
Int Suicide Genetics Consortium
Li Qingqin S.
Palotie Aarno
Shabalin Andrey A.
Publication venue
Publication date: 01/02/2023
Field of study

Suicide is a worldwide health crisis. We aimed to identify genetic risk variants associated with suicide death and suicidal behavior. Meta-analysis for suicide death was performed using 3765 cases from Utah and matching 6572 controls of European ancestry. Meta-analysis for suicidal behavior using data across five cohorts (n = 8315 cases and 256,478 psychiatric or populational controls of European ancestry) was also performed. One locus in neuroligin 1 (NLGN1) passing the genome-wide significance threshold for suicide death was identified (top SNP rs73182688, with p = 5.48 x 10(-8) before and p = 4.55 x 10(-8) after mtCOJO analysis conditioning on MDD to remove genetic effects on suicide mediated by MDD). Conditioning on suicidal attempts did not significantly change the association strength (p = 6.02 x 10(-8)), suggesting suicide death specificity. NLGN1 encodes a member of a family of neuronal cell surface proteins. Members of this family act as splice site-specific ligands for beta-neurexins and may be involved in synaptogenesis. The NRXN-NLGN pathway was previously implicated in suicide, autism, and schizophrenia. We additionally identified ROBO2 and ZNF28 associations with suicidal behavior in the meta-analysis across five cohorts in gene-based association analysis using MAGMA. Lastly, we replicated two loci including variants near SOX5 and LOC101928519 associated with suicidal attempts identified in the ISGC and MVP meta-analysis using the independent FinnGen samples. Suicide death and suicidal behavior showed positive genetic correlations with depression, schizophrenia, pain, and suicidal attempt, and negative genetic correlation with educational attainment. These correlations remained significant after conditioning on depression, suggesting pleiotropic effects among these traits. Bidirectional generalized summary-data-based Mendelian randomization analysis suggests that genetic risk for the suicidal attempt and suicide death are both bi-directionally causal for MDD.Peer reviewe

seeQTL: a searchable database for human eQTLs

Author: Andrey A. Shabalin
Benjamini
Choy
Consoli
Dimas
Donlin
Fei Zou
Feuk
Fred A. Wright
Gamazon
Grundberg
Holm
Kai Xia
Montgomery
Myers
Patrick F. Sullivan
Pickrell
Price
Schadt
Shunping Huang
Spielman
Stranger
Vered Madar
Wei Sun
Wei Wang
Yi-Hui Zhou
Zeller
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

Summary: seeQTL is a comprehensive and versatile eQTL database, including various eQTL studies and a meta-analysis of HapMap eQTL information. The database presents eQTL association results in a convenient browser, using both segmented local-association plots and genome-wide Manhattan plots

Crossref

FastMap: Fast eQTL mapping in homozygous populations

Author: Andrew B. Nobel
Andrey A. Shabalin
Beck
Broman
Bystrykh
Carlborg
Cervino
Chesler
Churchill
Churchill
Daniel M. Gatti
Doerge
Dupuis
Frazer
Frazer
Fred A. Wright
Gatti
Haley
Hillebrandt
Ivan Rusyn
Kadarmideen
Kang
Kao
Kendziorski
Kent
Kong
Lander
Manly
McClurg
McClurg
Mehrabian
Peirce
Pletcher
Pontius
Pritchard
Roberts
Roberts
Schadt
Storey
Szatkiewicz
Tieu-Chong Lam
Wang
Wang
Yang
Publication venue: Oxford University Press
Publication date: 01/01/2009
Field of study

Motivation: Gene expression Quantitative Trait Locus (eQTL) mapping measures the association between transcript expression and genotype in order to find genomic locations likely to regulate transcript expression. The availability of both gene expression and high-density genotype data has improved our ability to perform eQTL mapping in inbred mouse and other homozygous populations. However, existing eQTL mapping software does not scale well when the number of transcripts and markers are on the order of 105 and 105–106, respectively

Crossref

Refinement of schizophrenia GWAS loci using methylome-wide association data

Author: Aberg Karolina A.
Adkins Daniel E.
Chan Robin
Clark Shaunna L.
Hultman Christina M.
Kim Yunjung
Kumar Gaurav
Magnusson Patrik K.E.
McClay Joseph L.
Nerella Srilaxmi
Shabalin Andrey A.
Sullivan Patrick F.
van den Oord Edwin J.C.G.
Xie Linying
Publication venue
Publication date: 01/01/2015
Field of study

Recent genome-wide association studies (GWAS) have made substantial progress in identifying disease loci. The next logical step is to design functional experiments to identify disease mechanisms. This step, however, is often hampered by the large size of loci identified in GWAS that is caused by linkage disequilibrium (LD) between SNPs. In this study, we demonstrate how integrating methylome-wide association study (MWAS) results with GWAS findings can narrow down the location for a subset of the putative casual sites. We use the disease schizophrenia as an example. To handle “data analytic” variation we first combined our MWAS results with two GWAS meta-analyses (N=32,143 and 21,953), that had largely overlapping samples but different data analysis pipelines, separately. Permutation tests showed significant overlapping association signals between GWAS and MWAS findings. This significant overlap justified prioritizing loci based on the concordance principle. To further ensure that the methylation signal was not driven by chance, we successfully replicated the top three methylation findings near genes SDCCAG8, CREB1 and ATXN7 in an independent sample using targeted pyrosequencing. In contrast to the SNPs in the selected region, the methylation sites were largely uncorrelated explaining why the methylation signals implicated much smaller regions (median size 78bp). The refined loci showed considerable enrichment of genomic elements of possible functional importance and suggested specific hypotheses about schizophrenia etiology. Several hypotheses involved possible variation in transcription factor binding efficiencies

Springer - Publisher Connector

Basal-like Breast cancer DNA copy number losses identify genes involved in genomic instability, response to therapy, and patient survival

Author: Børresen-Dale Anne Lise
Chao Hann Hsiang
Grushko Tatyana
He Xiaping
Huo Dezheng
Kristensen Vessela N.
Nobel Andrew
Nordgard Silje H.
Nwachukwu Chika
Olopade Olufunmilayo I.
Parker Joel S.
Perou Charles M.
Shabalin Andrey A.
Weigman Victor J.
Publication venue
Publication date: 01/01/2011
Field of study

Breast cancer is a heterogeneous disease with known expression-defined tumor subtypes. DNA copy number studies have suggested that tumors within gene expression subtypes share similar DNA Copy number aberrations (CNA) and that CNA can be used to further sub-divide expression classes. To gain further insights into the etiologies of the intrinsic subtypes, we classified tumors according to gene expression subtype and next identified subtype-associated CNA using a novel method called SWITCHdna, using a training set of 180 tumors and a validation set of 359 tumors. Fisher’s exact tests, Chi-square approximations, and Wilcoxon rank-sum tests were performed to evaluate differences in CNA by subtype. To assess the functional significance of loss of a specific chromosomal region, individual genes were knocked down by shRNA and drug sensitivity, and DNA repair foci assays performed. Most tumor subtypes exhibited specific CNA. The Basal-like subtype was the most distinct with common losses of the regions containing RB1, BRCA1, INPP4B, and the greatest overall genomic instability. One Basal-like subtype-associated CNA was loss of 5q11–35, which contains at least three genes important for BRCA1-dependent DNA repair (RAD17, RAD50, and RAP80); these genes were predominantly lost as a pair, or all three simultaneously. Loss of two or three of these genes was associated with significantly increased genomic instability and poor patient survival. RNAi knockdown of RAD17, or RAD17/RAD50, in immortalized human mammary epithelial cell lines caused increased sensitivity to a PARP inhibitor and carboplatin, and inhibited BRCA1 foci formation in response to DNA damage. These data suggest a possible genetic cause for genomic instability in Basal-like breast cancers and a biological rationale for the use of DNA repair inhibitor related therapeutics in this breast cancer subtype.Electronic supplementary materialThe online version of this article (doi:10.1007/s10549-011-1846-y) contains supplementary material, which is available to authorized users